Big Data Cleaning
نویسنده
چکیده
Data cleaning is, in fact, a lively subject that has played an important part in the history of data management and data analytics, and it still is undergoing rapid development. Moreover, data cleaning is considered as a main challenge in the era of big data, due to the increasing volume, velocity and variety of data in many applications. This paper aims to provide an overview of recent work in different aspects of data cleaning: error detection methods, data repairing algorithms, and a generalized data cleaning system. It also includes some discussion about our efforts of data cleaning methods from the perspective of big data, in terms of volume, velocity and variety.
منابع مشابه
A Data Cleaning Model for Electric Power Big Data Based on Spark Framework
The data cleaning of electrical power big data can improve the correctness, the completeness, the consistency and the reliability of the data. Aiming at the difficulties of the extracting of the unified anomaly detection pattern and the low accuracy and continuity of the anomaly data correction in the process of the electrical power big data cleaning, the data cleaning model of the electrical p...
متن کاملExperiences with using Data Cleaning Technology for Bing Services
Over the past few years, our Data Management, Exploration and Mining (DMX) group at Microsoft Research has worked closely with the Bing team to address challenging data cleaning and approximate matching problems. In this article we describe some of the key Big Data challenges in the context of these Bing services primarily focusing on two key services: Bing Maps and Bing Shopping. We describe i...
متن کاملQualitative Data Cleaning
Data quality is one of the most important problems in data management, since dirty data often leads to inaccurate data analytics results and wrong business decisions. Data cleaning exercise often consist of two phases: error detection and error repairing. Error detection techniques can either be quantitative or qualitative; and error repairing is performed by applying data transformation script...
متن کاملResearch Statement Data Cleaning Algorithmic Data-cleaning Techniques
With the increasing amount of available data, turning raw data into actionable information is a requirement in every field. However, one bottleneck that impedes the process is data cleaning. Data analysts usually spend over half of their time cleaning data that is dirty — inconsistent, inaccurate, missing, and so on — before they even begin to do any real analysis. It is a time consuming and co...
متن کاملBig data gets bigger: what about data cleaning as a storage service?
The success of big data solutions principally rely on the timely extraction of valuable insights from data. This will continue to become more challenging due to the growths in data volume without corresponding increase in velocity. We advocate that storage systems of the future should include functionality to detect and harness fundamental data characteristics such as similarity and correlation...
متن کامل